optimizations and formatting frontend and backend#114
Conversation
WalkthroughFrontend analyze pages now use a runtime-configurable backend base URL (NEXT_PUBLIC_API_URL) for /api/process, /api/bias, and /api/chat calls and add a console.log in chat handling. Backend changes are mostly formatting, add a bs4 extraction fallback in the scraper, and remove the backend/start.sh script. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant FE_Load as Analyze Loading Page (FE)
participant BE as Backend
User->>FE_Load: Open analyze loading
par
FE_Load->>BE: POST ${backend_url}/api/process { url: storedUrl }
BE-->>FE_Load: process result
and
FE_Load->>BE: POST ${backend_url}/api/bias { url: storedUrl }
BE-->>FE_Load: bias result
end
FE_Load-->>User: Navigate to results
sequenceDiagram
actor User
participant FE_Results as Analyze Results Page (FE)
participant BE as Backend
User->>FE_Results: Send chat message
FE_Results->>BE: POST ${backend_url}/api/chat { message }
BE-->>FE_Results: chat response
FE_Results->>FE_Results: console.log(data)
FE_Results-->>User: Render response
sequenceDiagram
participant Scraper as extractor
participant trafilatura
participant newspaper
participant bs4 as bs4/readability
Scraper->>trafilatura: try extract
trafilatura-->>Scraper: text? (if yes -> return)
Scraper->>newspaper: try extract
newspaper-->>Scraper: text? (if yes -> return)
Scraper->>bs4: try extract_with_bs4 (readability + BeautifulSoup)
bs4-->>Scraper: text? (if yes -> return)
Scraper-->>Scraper: return failure shape (if all fail)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
frontend/app/analyze/loading/page.tsx (1)
104-132: Fix effect cleanup: intervals/timeouts are not cleared on unmountThe cleanup function returned inside runAnalysis is never registered with useEffect, so progressInterval leaks; redirect timeout also isn’t cleared. Capture timer IDs in the effect scope and clear them in the useEffect return.
Minimal changes within this block:
- const stepInterval = setInterval(() => { + stepInterval = setInterval(() => { setCurrentStep((prev) => { if (prev < steps.length - 1) { return prev + 1; } else { clearInterval(stepInterval); - setTimeout(() => { - router.push("/analyze/results"); - }, 2000); + redirectTimeout = setTimeout(() => { + router.push("/analyze/results"); + }, 2000); return prev; } }); }, 2000); - const progressInterval = setInterval(() => { + progressInterval = setInterval(() => { setProgress((prev) => { if (prev < 100) { return prev + 1; } return prev; }); }, 100); - - return () => { - clearInterval(stepInterval); - clearInterval(progressInterval); - };Additions needed outside this range:
// Declare in the outer scope of the useEffect let stepInterval: ReturnType<typeof setInterval> | null = null; let progressInterval: ReturnType<typeof setInterval> | null = null; let redirectTimeout: ReturnType<typeof setTimeout> | null = null; // And return a proper cleanup from the useEffect: return () => { if (stepInterval) clearInterval(stepInterval); if (progressInterval) clearInterval(progressInterval); if (redirectTimeout) clearTimeout(redirectTimeout); };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
frontend/app/analyze/loading/page.tsx(2 hunks)frontend/app/analyze/results/page.tsx(4 hunks)frontend/app/page.tsx(11 hunks)
🔇 Additional comments (6)
frontend/app/analyze/loading/page.tsx (1)
75-80: Ensure backend CORS is properly configured for cross-origin requestsSwitching to a separate backend origin (
process.env.NEXT_PUBLIC_API_URL) means your server must explicitly allow requests from your frontend. Verify that both/api/processand/api/biasendpoints return the appropriate CORS headers:
Access-Control-Allow-Originset to your frontend’s origin (or*if that aligns with your security model)Access-Control-Allow-Headersincluding at leastContent-Type(and any custom/auth headers you use)Access-Control-Allow-Credentials: trueif you plan to send cookies or authentication tokensI wasn’t able to confirm a
.env.exampleentry forNEXT_PUBLIC_API_URL. Please document this variable in your.env.exampleso that team members know to configure the correct backend URL.frontend/app/analyze/results/page.tsx (4)
7-7: Import consolidation LGTMConsolidated lucide-react import is clean and consistent.
49-51: Minor: isLoading gate set LGTMSetting isLoading to false only when both artifacts exist is fine; the later effect rehydrates and guards redirect.
67-75: Rehydration and redirect flow LGTMThe guard ensures consistent state and redirects if session data is missing.
112-114: Return block change LGTMFormatting-only change around the return; no functional impact.
frontend/app/page.tsx (1)
1-301: Formatting-only changes LGTMNo functional differences introduced. The reflowed imports/strings and minor JSX formatting are consistent with Prettier/Ruff output.
| const backend_url = process.env.NEXT_PUBLIC_API_URL; | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Normalize base URL and add safe fallback to same-origin
Using process.env on the client is inlined at build time. If NEXT_PUBLIC_API_URL isn’t set, the current code builds "undefined/api/..." and issues a bad relative request. Also, trailing slashes can produce double "///". Normalize and provide a safe fallback.
Apply:
-const backend_url = process.env.NEXT_PUBLIC_API_URL;
+// Normalize base URL (strip trailing slashes). Fallback to same-origin in browser.
+const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, "");
+const apiUrl = (path: string) =>
+ new URL(path, backendUrl || window.location.origin).toString();Optionally, centralize apiUrl in a shared util (e.g., src/lib/api.ts) to DRY across pages.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const backend_url = process.env.NEXT_PUBLIC_API_URL; | |
| // Normalize base URL (strip trailing slashes). Fallback to same-origin in browser. | |
| const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, ""); | |
| const apiUrl = (path: string) => | |
| new URL(path, backendUrl || window.location.origin).toString(); |
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 19 to 20, the build-time
inlined NEXT_PUBLIC_API_URL can be undefined and may include trailing slashes,
causing requests like "undefined/api/..." or double slashes; change to compute a
normalized base URL with a safe same-origin fallback: if NEXT_PUBLIC_API_URL is
falsy use an empty string (or '/') for same-origin, trim any trailing slashes
from the env value, and ensure you always join paths safely (e.g., prepend a
single '/' when constructing endpoint paths or use the URL constructor).
Optionally move this logic into a shared util (src/lib/api.ts) and export a
normalized apiUrl to reuse across pages.
| axios.post(`${backend_url}/api/process`, { | ||
| url: storedUrl, | ||
| }), | ||
| axios.post(`${backend_url}/api/bias`, { | ||
| url: storedUrl, | ||
| }), |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Use the normalized apiUrl helper to avoid double slashes and bad "undefined" paths
This ensures correctness regardless of env presence and trailing slashes.
- axios.post(`${backend_url}/api/process`, {
- url: storedUrl,
- }),
- axios.post(`${backend_url}/api/bias`, {
- url: storedUrl,
- }),
+ axios.post(apiUrl("/api/process"), { url: storedUrl }),
+ axios.post(apiUrl("/api/bias"), { url: storedUrl }),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| axios.post(`${backend_url}/api/process`, { | |
| url: storedUrl, | |
| }), | |
| axios.post(`${backend_url}/api/bias`, { | |
| url: storedUrl, | |
| }), | |
| axios.post(apiUrl("/api/process"), { url: storedUrl }), | |
| axios.post(apiUrl("/api/bias"), { url: storedUrl }), |
🤖 Prompt for AI Agents
In frontend/app/analyze/loading/page.tsx around lines 75 to 80, the code builds
endpoints with string concatenation `${backend_url}/api/...` which can produce
double slashes or "undefined" paths; replace those concatenations with the
normalized apiUrl helper (import it if not already) and call
apiUrl('/api/process') and apiUrl('/api/bias') (or equivalent) when creating the
axios.post targets so environment values and trailing slashes are handled
consistently.
| import BiasMeter from "@/components/bias-meter"; | ||
| import axios from "axios"; | ||
|
|
||
| const backend_url = process.env.NEXT_PUBLIC_API_URL; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Normalize base URL and add apiUrl helper (same as loading page)
Prevent "undefined/api/..." and double slashes; provide same-origin fallback.
-const backend_url = process.env.NEXT_PUBLIC_API_URL;
+const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, "");
+const apiUrl = (path: string) =>
+ new URL(path, backendUrl || window.location.origin).toString();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const backend_url = process.env.NEXT_PUBLIC_API_URL; | |
| const backendUrl = (process.env.NEXT_PUBLIC_API_URL ?? "").replace(/\/+$/, ""); | |
| const apiUrl = (path: string) => | |
| new URL(path, backendUrl || window.location.origin).toString(); |
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around line 22, the current const
backend_url = process.env.NEXT_PUBLIC_API_URL; can produce "undefined/api/..."
or double slashes and lacks a same-origin fallback; replace this with the same
normalization used in the loading page by creating an apiUrl helper that reads
NEXT_PUBLIC_API_URL, falls back to '' (same-origin) when unset, trims any
trailing slash, and returns a base string you can append endpoints to without
introducing duplicate slashes or "undefined"; update usages in this file to call
that helper instead of using backend_url directly.
| async function handleSendMessage(e: React.FormEvent) { | ||
| e.preventDefault(); | ||
| if (!message.trim()) return; | ||
| const newMessages = [...messages, { role: "user", content: message }]; | ||
| setMessages(newMessages); | ||
| setMessage(""); | ||
|
|
||
| const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", { | ||
| message: message | ||
| const res = await axios.post(`${backend_url}/api/chat`, { | ||
| message: message, | ||
| }); | ||
| const data = res.data; | ||
|
|
||
| console.log(data) | ||
| const data = res.data; | ||
|
|
||
| // 🔹 Step 2: Append LLM’s response | ||
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | ||
| }; | ||
| console.log(data); | ||
|
|
||
| // 🔹 Step 2: Append LLM’s response | ||
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | ||
| } |
There was a problem hiding this comment.
Add error handling and dev-only logging in chat handler; use apiUrl helper
Current code throws on network errors and logs verbosely in production. Guard with try/catch and normalize the endpoint.
- async function handleSendMessage(e: React.FormEvent) {
+ async function handleSendMessage(e: React.FormEvent) {
e.preventDefault();
if (!message.trim()) return;
const newMessages = [...messages, { role: "user", content: message }];
setMessages(newMessages);
setMessage("");
-
- const res = await axios.post(`${backend_url}/api/chat`, {
- message: message,
- });
- const data = res.data;
-
- console.log(data);
-
- // 🔹 Step 2: Append LLM’s response
- setMessages([...newMessages, { role: "assistant", content: data.answer }]);
+ try {
+ const { data } = await axios.post(apiUrl("/api/chat"), { message });
+ if (process.env.NODE_ENV !== "production") {
+ console.debug("chat response:", data);
+ }
+ setMessages([
+ ...newMessages,
+ { role: "assistant", content: data.answer },
+ ]);
+ } catch (error) {
+ console.error("Failed to send chat message:", error);
+ setMessages([
+ ...newMessages,
+ {
+ role: "assistant",
+ content: "Sorry, I couldn’t reach the server. Please try again.",
+ },
+ ]);
+ }
}Optional: add an isSending flag to disable the form while awaiting the response.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function handleSendMessage(e: React.FormEvent) { | |
| e.preventDefault(); | |
| if (!message.trim()) return; | |
| const newMessages = [...messages, { role: "user", content: message }]; | |
| setMessages(newMessages); | |
| setMessage(""); | |
| const res = await axios.post("http://Thunder1245-perspective-backend.hf.space/api/chat", { | |
| message: message | |
| const res = await axios.post(`${backend_url}/api/chat`, { | |
| message: message, | |
| }); | |
| const data = res.data; | |
| console.log(data) | |
| const data = res.data; | |
| // 🔹 Step 2: Append LLM’s response | |
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | |
| }; | |
| console.log(data); | |
| // 🔹 Step 2: Append LLM’s response | |
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | |
| } | |
| async function handleSendMessage(e: React.FormEvent) { | |
| e.preventDefault(); | |
| if (!message.trim()) return; | |
| const newMessages = [...messages, { role: "user", content: message }]; | |
| setMessages(newMessages); | |
| setMessage(""); | |
| try { | |
| const { data } = await axios.post(apiUrl("/api/chat"), { message }); | |
| if (process.env.NODE_ENV !== "production") { | |
| console.debug("chat response:", data); | |
| } | |
| setMessages([ | |
| ...newMessages, | |
| { role: "assistant", content: data.answer }, | |
| ]); | |
| } catch (error) { | |
| console.error("Failed to send chat message:", error); | |
| setMessages([ | |
| ...newMessages, | |
| { | |
| role: "assistant", | |
| content: "Sorry, I couldn’t reach the server. Please try again.", | |
| }, | |
| ]); | |
| } | |
| } |
🤖 Prompt for AI Agents
frontend/app/analyze/results/page.tsx around lines 78 to 94: the chat submit
handler currently calls the backend directly, logs unconditionally, and can
throw on network errors; wrap the request in a try/catch, use the apiUrl helper
to build the endpoint (e.g. apiUrl('/api/chat')) instead of concatenating
backend_url, and add an isSending state flag set true before the request and
false in finally to disable the form while awaiting the response; in the catch
block gracefully handle errors (set isSending false, optionally set an error
state or show a toast) and only perform console.log/debug logging when
process.env.NODE_ENV === 'development'.
| const res = await axios.post(`${backend_url}/api/chat`, { | ||
| message: message, | ||
| }); | ||
| const data = res.data; | ||
|
|
||
| console.log(data) | ||
| const data = res.data; | ||
|
|
||
| // 🔹 Step 2: Append LLM’s response | ||
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | ||
| }; | ||
| console.log(data); | ||
|
|
||
| // 🔹 Step 2: Append LLM’s response | ||
| setMessages([...newMessages, { role: "assistant", content: data.answer }]); | ||
| } |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Ensure backend CORS covers chat endpoint
Switching to a separate origin requires proper CORS on /api/chat similar to other endpoints.
🏁 Script executed:
#!/bin/bash
# Sanity-check we migrated all frontend chat/process/bias endpoints to use NEXT_PUBLIC_API_URL
rg -n --hidden --glob '!.git' '(axios\.post|fetch)\(.*?/api/(chat|process|bias)' -A1 -B2
# Show any remaining hard-coded http(s):// endpoints
rg -n --hidden --glob '!.git' 'https?://.*/api/(chat|process|bias)' -A0 -B0Length of output: 986
Ensure backend CORS covers the chat endpoint
All frontend calls to /api/chat, /api/process, and /api/bias now use the backend_url environment variable, so they run against a separate origin. You must update your backend’s CORS configuration to allow requests from your Next.js app’s origin.
Please verify the CORS settings for the following locations:
- frontend/app/analyze/results/page.tsx (around lines 85–94)
- frontend/app/analyze/loading/page.tsx (around lines 73–79)
Make sure the /api/chat endpoint’s CORS policy mirrors the configuration used for /api/process and /api/bias.
🤖 Prompt for AI Agents
In frontend/app/analyze/results/page.tsx around lines 85 to 94 the frontend now
calls the backend via backend_url for /api/chat so the backend must permit
cross-origin requests from the Next.js app origin; update the backend CORS
configuration for the /api/chat route to include the same allowed origin(s),
methods, and headers used for /api/process and /api/bias (or reuse that
middleware/config), ensure credentials setting and preflight (OPTIONS) handling
match, and redeploy so requests from the frontend origin succeed.
There was a problem hiding this comment.
Actionable comments posted: 5
🔭 Outside diff range comments (2)
backend/app/modules/scraper/extractor.py (2)
26-26: Critical: headers passed incorrectly to requests.get (being treated as query params).Pass headers via the named argument; otherwise the custom User-Agent is never sent.
- res = requests.get(self.url, self.headers, timeout=10) + res = requests.get(self.url, headers=self.headers, timeout=10)
24-31: SSRF hardening recommended for URL fetches.If the URL originates from user input, add guards (scheme allowlist http/https, block private IPs/localhost, DNS pinning, redirect limits) to avoid SSRF.
🧹 Nitpick comments (8)
README.md (3)
137-145: Fix list indentation, fenced code language, stray asterisk, and .env spacing
- MD007: remove extra list indentation.
- MD040: specify a language for fenced block.
- Typo: “Setup environment variables:*” should be italic like other sections.
- .env: remove spaces around “=”.
- Spacing: add space in “
/frontenddirectory”.-Setup environment variables:* - - add .env file in `/frontend`directory. - - add following environment variable in your .env file. -``` -NEXT_PUBLIC_API_URL = http://localhost:8000 - -``` +*Setup environment variables:* +- add .env file in `/frontend` directory. +- add the following environment variable in your `.env` file. +```env +NEXT_PUBLIC_API_URL=http://localhost:8000 +```
142-142: Clarify dev vs prod URLConsider adding a brief note that http://localhost:8000 is for local development; use your deployed backend’s HTTPS URL in production to avoid mixed content/CORS issues.
163-170: Align backend env block with markdownlint and dotenv conventions
- MD007: remove extra list indentation.
- MD040: add language to the fenced block.
- .env: remove spaces around “=” and fix spacing in “
/backenddirectory”.- Remove stray indentation before closing fence.
- - add .env file in `/backend`directory. - - add following environment variable in your .env file. - ``` -GROQ_API_KEY= <groq_api_key> -PINECONE_API_KEY = <your_pinecone_API_KEY> -PORT = 8000 -SEARCH_KEY = <your_Google_custom_search_engine_API_key> - ``` +- add .env file in `/backend` directory. +- add the following environment variables in your `.env` file. +```env +GROQ_API_KEY=<groq_api_key> +PINECONE_API_KEY=<your_pinecone_API_KEY> +PORT=8000 +SEARCH_KEY=<your_Google_custom_search_engine_API_key> +```Optional: include a
.env.examplewith these keys and ensure.envis gitignored.backend/app/modules/scraper/extractor.py (3)
17-22: UA header OK; consider configurability and locale hints.Make the UA configurable (env/config) and consider adding an Accept-Language to improve extraction on locale-sensitive sites.
self.headers = { - "User-Agent": ( + "User-Agent": ( "Mozilla/5.0 (Windows NT 10.0; Win64; x64)" " AppleWebKit/537.36 " "(KHTML, like Gecko) Chrome/113.0 Safari/537.36" ) - } + , + "Accept-Language": "en-US,en;q=0.9", + }
71-76: Prefer lxml parser when available; normalize text to reduce noise.html.parser works, but lxml is faster and more robust if installed. Also consider trimming empty lines to reduce boilerplate.
- soup = BeautifulSoup(doc.summary(), "html.parser") - title = doc.title() - text = soup.get_text(separator="\n") + try: + soup = BeautifulSoup(doc.summary(), "lxml") + except Exception: + soup = BeautifulSoup(doc.summary(), "html.parser") + title = doc.title() + raw = soup.get_text(separator="\n") + text = "\n".join(line.strip() for line in raw.splitlines() if line.strip())
92-92: Standardize the failure payload shape for easier client handling.Consider returning consistent keys even on failure.
- return {"url": self.url, "text": "", "error": "Failed to extract article."} + return { + "url": self.url, + "title": "", + "text": "", + "authors": [], + "publish_date": None, + "error": "Failed to extract article.", + }backend/app/db/vector_store.py (1)
27-27: Formatting change is fine; consider making provider/region configurableSingle-line spec is OK. For multi-region/multi-cloud deployments, consider sourcing cloud/region from env (e.g., PINECONE_CLOUD, PINECONE_REGION) with sane defaults.
backend/app/modules/vector_store/chunk_rag_data.py (1)
48-53: Avoid recreatingfact_fieldson every loop; add a type check forfact
- Defining
fact_fieldsinside the loop recreates the list for every iteration; hoist it once above the loop.- Add a clear type check to avoid TypeErrors when
factisn’t a dict.Define once above the loop:
fact_fields = ["original_claim", "verdict", "explanation", "source_link"] for i, fact in enumerate(data["facts"]): if not isinstance(fact, dict): raise ValueError(f"Fact at index {i} must be a dict") for field in fact_fields: if field not in fact: raise ValueError(f"Missing required fact field: {field} in fact index {i}")Within this hunk, remove the per-iteration assignment and add the type check:
- fact_fields = ["original_claim", "verdict", "explanation", "source_link"] - for field in fact_fields: + if not isinstance(fact, dict): + raise ValueError(f"Fact at index {i} must be a dict") + for field in fact_fields: if field not in fact: raise ValueError( f"Missing required fact field: {field} in fact index {i}" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
README.md(2 hunks)backend/app/db/vector_store.py(2 hunks)backend/app/modules/bias_detection/check_bias.py(2 hunks)backend/app/modules/chat/embed_query.py(0 hunks)backend/app/modules/chat/get_rag_data.py(1 hunks)backend/app/modules/chat/llm_processing.py(2 hunks)backend/app/modules/facts_check/web_search.py(1 hunks)backend/app/modules/langgraph_builder.py(3 hunks)backend/app/modules/langgraph_nodes/error_handler.py(1 hunks)backend/app/modules/langgraph_nodes/fact_check.py(1 hunks)backend/app/modules/langgraph_nodes/generate_perspective.py(2 hunks)backend/app/modules/langgraph_nodes/sentiment.py(1 hunks)backend/app/modules/langgraph_nodes/store_and_send.py(1 hunks)backend/app/modules/scraper/cleaner.py(4 hunks)backend/app/modules/scraper/extractor.py(5 hunks)backend/app/modules/scraper/keywords.py(2 hunks)backend/app/modules/vector_store/chunk_rag_data.py(1 hunks)backend/app/modules/vector_store/embed.py(1 hunks)backend/app/routes/routes.py(0 hunks)backend/app/utils/fact_check_utils.py(1 hunks)backend/app/utils/store_vectors.py(1 hunks)backend/main.py(2 hunks)backend/start.sh(0 hunks)
💤 Files with no reviewable changes (3)
- backend/app/modules/chat/embed_query.py
- backend/app/routes/routes.py
- backend/start.sh
✅ Files skipped from review due to trivial changes (14)
- backend/app/modules/langgraph_nodes/store_and_send.py
- backend/app/utils/store_vectors.py
- backend/app/modules/chat/llm_processing.py
- backend/app/modules/scraper/keywords.py
- backend/app/modules/chat/get_rag_data.py
- backend/main.py
- backend/app/modules/langgraph_nodes/fact_check.py
- backend/app/modules/langgraph_nodes/sentiment.py
- backend/app/modules/langgraph_nodes/error_handler.py
- backend/app/modules/facts_check/web_search.py
- backend/app/utils/fact_check_utils.py
- backend/app/modules/scraper/cleaner.py
- backend/app/modules/bias_detection/check_bias.py
- backend/app/modules/langgraph_builder.py
🧰 Additional context used
🪛 Ruff (0.12.2)
backend/app/db/vector_store.py
36-36: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🪛 markdownlint-cli2 (0.17.2)
README.md
139-139: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
140-140: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
141-141: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
163-163: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
164-164: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
165-165: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (7)
backend/app/modules/scraper/extractor.py (3)
43-44: LGTM: explicit JSON output for trafilatura.This makes the return type deterministic and simplifies downstream handling.
58-61: LGTM: safe ISO-8601 serialization for publish_date.Handles None cleanly and preserves tz info when present.
85-86: No direct dict indexing ofauthors/publish_datefound; safe to proceedA repository-wide search for
['authors'],["authors"],['publish_date'], or["publish_date"]returned no matches, indicating there are no strict dict key lookups that would raise a KeyError. Downstream consumers already guard against missing fields (e.g. using.get()), so the new bs4+Readability fallback can safely omit those keys without breaking. No further changes required.backend/app/modules/vector_store/chunk_rag_data.py (1)
55-67: LGTM; chunk payload shape aligns with downstream embedderChunk has id/text/metadata as expected. No functional concerns.
backend/app/modules/vector_store/embed.py (1)
15-16: Minor formatting improvement is fineSingle f-string is clearer. No functional change.
backend/app/modules/langgraph_nodes/generate_perspective.py (2)
46-52: All prompt variable names correctly aligned withgeneration_promptThe keys passed to
chain.invoke—cleaned_article,facts, andsentiment—exactly match the placeholders defined in the template ({cleaned_article},{facts},{sentiment}), so there’s no risk of KeyErrors or empty slots.
60-60: Ignore the suggestion to return only the raw string here; downstream nodes require the fullPerspectiveOutputobject
judge_perspectiveimmediately doesand will break ifperspective_obj = state.get("perspective") text = getattr(perspective_obj, "perspective", "").strip()perspectiveis a string.chunk_rag_data.pysimilarly expectsstate["perspective"]to have.perspectiveand.reasoningattributes when building vector‐store entries.The proper place to strip out reasoning is at the external API boundary—not in this internal pipeline node.
Likely an incorrect or invalid review comment.
| except Exception as e: | ||
| raise RuntimeError("Error occured while " | ||
| f"connecting to the index {INDEX_NAME}:{e}") | ||
| raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}") |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Use exception chaining and fix typos in error message (Ruff B904)
Adopt raise ... from e and correct spelling to improve debuggability and consistency.
Apply this diff:
- raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
+ raise RuntimeError(
+ f"Error occurred while connecting to the index {INDEX_NAME}: {e}"
+ ) from eAlso mirror the same improvements for the earlier client initialization exception (Line 13), e.g.:
except Exception as e:
raise RuntimeError(
f"Error occurred while initializing Pinecone client: {e}"
) from e🏁 Script executed:
#!/bin/bash
# Display the beginning of the file to locate the client initialization exception around line 13
sed -n '1,50p' backend/app/db/vector_store.py
echo "----"
# Display lines around the reported error at line 36
sed -n '30,70p' backend/app/db/vector_store.pyLength of output: 1414
Use exception chaining and correct typos in error messages
Two locations in backend/app/db/vector_store.py need updates to improve debuggability and consistency:
- Client initialization block (around line 13)
- Index connection block (around line 36)
Apply the following diffs:
--- a/backend/app/db/vector_store.py
@@ -9,7 +9,10 @@ PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
try:
# Initialize Pinecone client
pc = Pinecone(api_key=PINECONE_API_KEY)
-
-except Exception as e:
- raise RuntimeError(f"Error occured while intialising pinecone client:{e}")
+except Exception as e:
+ raise RuntimeError(
+ f"Error occurred while initializing Pinecone client: {e}"
+ ) from e--- a/backend/app/db/vector_store.py
@@ -34,7 +37,10 @@ if not pc.has_index(INDEX_NAME):
try:
# Connect to the index
index = pc.Index(INDEX_NAME)
-
-except Exception as e:
- raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}")
+except Exception as e:
+ raise RuntimeError(
+ f"Error occurred while connecting to the index {INDEX_NAME}: {e}"
+ ) from eThese changes:
- Correct “occured” → “occurred” and “intialising” → “initializing”
- Adopt exception chaining (
raise … from e) for original traceback preservation
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}") | |
| PINECONE_API_KEY = os.getenv("PINECONE_API_KEY") | |
| try: | |
| # Initialize Pinecone client | |
| pc = Pinecone(api_key=PINECONE_API_KEY) | |
| except Exception as e: | |
| raise RuntimeError( | |
| f"Error occurred while initializing Pinecone client: {e}" | |
| ) from e |
| raise RuntimeError(f"Error occured while connecting to the index {INDEX_NAME}:{e}") | |
| if not pc.has_index(INDEX_NAME): | |
| try: | |
| # Connect to the index | |
| index = pc.Index(INDEX_NAME) | |
| except Exception as e: | |
| raise RuntimeError( | |
| f"Error occurred while connecting to the index {INDEX_NAME}: {e}" | |
| ) from e |
🧰 Tools
🪛 Ruff (0.12.2)
36-36: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In backend/app/db/vector_store.py around lines 13 and 36, update the two
RuntimeError messages to correct typos and preserve original tracebacks: change
"intialising" to "initializing" in the client initialization block and "occured"
to "occurred" in the index connection block, and re-raise using exception
chaining (raise RuntimeError("...") from e) so the original exception is
attached to the new RuntimeError.
| model=my_llm, | ||
| temperature=0.7 | ||
| ) | ||
| llm = ChatGroq(model=my_llm, temperature=0.7) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Lower temperature for structured output and verify model ID
For schema-constrained generation, a lower temperature materially reduces parse/validation errors. Also, please confirm the Groq model ID is valid.
Apply:
-llm = ChatGroq(model=my_llm, temperature=0.7)
+llm = ChatGroq(model=my_llm, temperature=0.2)Optionally consider making my_llm configurable via env/config for deployments.
To verify the model ID:
🌐 Web query:
What are the currently supported Groq model IDs for ChatGroq? Is "llama-3.3-70b-versatile" valid?
💡 Result:
Short answer: Yes — llama-3.3-70b-versatile is a valid Groq model ID and is commonly used with ChatGroq. (See Groq's model page and Chat/Playground examples.) (console.groq.com)
Which models are currently available for ChatGroq (representative list, as shown in Groq’s docs and third‑party integrator pages):
- llama-3.3-70b-versatile. (console.groq.com)
- llama-3.1-70b-versatile, llama-3.1-8b-instant. (groq-preview.mintlify.app, langbase.com)
- llama3-70b-8192, llama3-8b-8192, and the Groq preview tool‑use IDs (e.g., llama3-groq-70b-8192-tool-use-preview / llama3-groq-8b-8192-tool-use-preview). (groq-preview.mintlify.app, pub.dev)
- mixtral-8x7b-32768, gemma2-9b-it, qwen/qwen3-32b, deepseek-r1-distill-llama-70b, whisper-large-v3, Moonshot Kimi K2, etc. (groq-preview.mintlify.app, langbase.com)
Notes and recommendation:
- Model availability and exact IDs can change. To get the authoritative, up‑to‑date list you can call Groq’s models endpoint (returns all active model IDs): GET https://api.groq.com/openai/v1/models (example curl shown in Groq docs). (groq-preview.mintlify.app)
- When using LangChain’s ChatGroq, pass the Groq model ID as the model parameter (e.g., model="llama-3.3-70b-versatile"). (python.langchain.com)
If you’d like, I can (a) fetch the live model list right now from Groq’s API for you, or (b) show exact code examples for calling ChatGroq with llama-3.3-70b-versatile in your environment. Which would you prefer?
Citations:
- 1: https://console.groq.com/docs/model/llama-3.3-70b-versatile?utm_source=chatgpt.com
- 2: https://console.groq.com/docs/model/llama-3.3-70b-versatile?utm_source=chatgpt.com
- 3: https://groq-preview.mintlify.app/get-started/models?utm_source=chatgpt.com
- 4: https://langbase.com/docs/supported-models-and-providers?utm_source=chatgpt.com
- 5: https://groq-preview.mintlify.app/get-started/models?utm_source=chatgpt.com
- 6: https://pub.dev/documentation/groq_sdk/latest/?utm_source=chatgpt.com
- 7: https://groq-preview.mintlify.app/get-started/models?utm_source=chatgpt.com
- 8: https://langbase.com/docs/supported-models-and-providers?utm_source=chatgpt.com
- 9: https://groq-preview.mintlify.app/get-started/models?utm_source=chatgpt.com
- 10: https://python.langchain.com/api_reference/groq/chat_models/langchain_groq.chat_models.ChatGroq.html?utm_source=chatgpt.com
Lower temperature for structured output (model ID confirmed valid)
The Groq model ID "llama-3.3-70b-versatile" is valid and widely used. To improve schema-constrained generation reliability, please lower the temperature:
File: backend/app/modules/langgraph_nodes/generate_perspective.py
Lines: 16
- llm = ChatGroq(model=my_llm, temperature=0.7)
+ llm = ChatGroq(model=my_llm, temperature=0.2)Optional: Make my_llm configurable via environment variables or your configuration system for easier deployments.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| llm = ChatGroq(model=my_llm, temperature=0.7) | |
| llm = ChatGroq(model=my_llm, temperature=0.2) |
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/generate_perspective.py around line 16,
the ChatGroq LLM is instantiated with temperature=0.7 which can produce
non-deterministic, free-form output; lower the temperature (e.g., 0.0–0.2) to
improve schema-constrained/structured outputs and update the call accordingly,
and optionally make the model id and temperature configurable via environment
variables or the existing config system so deployments can adjust model and
temperature without code changes.
| facts_str = "\n".join( | ||
| [ | ||
| f"Claim: {f['original_claim']}\n" | ||
| "Verdict: {f['verdict']}\nExplanation: " | ||
| "{f['explanation']}" | ||
| for f in state["facts"] | ||
| ] | ||
| ) |
There was a problem hiding this comment.
Critical: f-string interpolation bug in facts_str (‘verdict’ and ‘explanation’ not rendered)
Only the first segment is an f-string; the others are plain strings, so braces are emitted literally. This changes behavior.
Fix and also avoid an intermediate list:
- facts_str = "\n".join(
- [
- f"Claim: {f['original_claim']}\n"
- "Verdict: {f['verdict']}\nExplanation: "
- "{f['explanation']}"
- for f in state["facts"]
- ]
- )
+ facts_str = "\n".join(
+ (
+ f"Claim: {f['original_claim']}\n"
+ f"Verdict: {f['verdict']}\n"
+ f"Explanation: {f['explanation']}"
+ )
+ for f in facts
+ )Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In backend/app/modules/langgraph_nodes/generate_perspective.py around lines 37
to 44, the string building uses only the first segment as an f-string so
"{f['verdict']}" and "{f['explanation']}" are not interpolated; replace the
current list comprehension with a single generator expression that yields a
fully interpolated f-string for each fact and pass that generator directly to
"\n".join to avoid creating an intermediate list and ensure verdict and
explanation are rendered.
| chunks.append( | ||
| { | ||
| "id": f"{article_id}-perspective", | ||
| "text": perspective_obj.perspective, | ||
| "metadata": { | ||
| "type": "counter-perspective", | ||
| "reasoning": perspective_obj.reasoning, | ||
| "article_id": article_id, | ||
| }, | ||
| } | ||
| }) | ||
| ) | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Robustify perspective handling; perspective_data is prepared but unused
You normalize perspective_data = data["perspective"] (Line 16-19) but then use perspective_obj = data["perspective"] and access attributes. If perspective is a dict, the hasattr check fails and attribute access on Line 37/40 will break. Normalize once and read via a unified interface.
Suggested approach outside this hunk (place before the append):
perspective_raw = data["perspective"]
if hasattr(perspective_raw, "dict"):
perspective_raw = perspective_raw.dict()
# Unified access
if isinstance(perspective_raw, dict):
perspective_text = perspective_raw.get("perspective")
reasoning_text = perspective_raw.get("reasoning")
else:
perspective_text = getattr(perspective_raw, "perspective", None)
reasoning_text = getattr(perspective_raw, "reasoning", None)
if not perspective_text or not reasoning_text:
raise ValueError("Perspective object missing required fields")Then update this block as follows:
- "text": perspective_obj.perspective,
+ "text": perspective_text,
"metadata": {
"type": "counter-perspective",
- "reasoning": perspective_obj.reasoning,
+ "reasoning": reasoning_text,
"article_id": article_id,
},🤖 Prompt for AI Agents
In backend/app/modules/vector_store/chunk_rag_data.py around lines 34 to 45, the
code uses perspective_obj attributes directly which will break when
data["perspective"] is a dict; normalize the raw perspective first (e.g., assign
perspective_raw = data["perspective"], convert via .dict() if present), then
extract unified perspective_text and reasoning_text by checking isinstance(dict)
and using .get(...) or getattr(...), validate both fields exist (raise
ValueError if missing), and finally use perspective_text and reasoning_text in
the chunks.append metadata instead of accessing perspective_obj.perspective /
.reasoning.
| vectors.append( | ||
| {"id": chunk["id"], "values": embedding, "metadata": chunk["metadata"]} | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Validate presence of 'id' and 'metadata' before use to avoid KeyError
You only validate 'text' earlier, but access 'id' and 'metadata' here. Add validation to the loop at Lines 11-16 to enforce all required keys.
Example adjustment outside this hunk:
# Validate chunk structure
for i, chunk in enumerate(chunks):
if not isinstance(chunk, dict):
raise ValueError(f"Invalid chunk at index {i}: expected dict")
missing = [k for k in ("id", "text", "metadata") if k not in chunk]
if missing:
raise ValueError(f"Invalid chunk at index {i}: missing keys {missing}")🤖 Prompt for AI Agents
In backend/app/modules/vector_store/embed.py around lines 11 to 16 (and
affecting lines 23-25), the loop only validates 'text' but later accesses 'id'
and 'metadata', risking KeyError; update the loop to first ensure each chunk is
a dict and then check for the presence of all required keys ("id", "text",
"metadata") for each chunk (reporting the chunk index and the missing keys),
raising a clear ValueError if any are missing so the later vectors.append(...)
can safely use chunk["id"] and chunk["metadata"].
Tasks done:
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Documentation
Refactor